home *** CD-ROM | disk | FTP | other *** search
Text File | 1994-09-12 | 14.8 KB | 528 lines | [TEXT/ttxt] |
- ZEBU -- AN LALR(1) PARSER SYSTEM IN SCHEME
- Version .9
-
- Copyright (C) 1989, by William M. Wells III
- All Rights Reserved
- Permission is granted for unrestricted non-commercial use.
-
-
-
-
-
- INTRODUCTION
-
- ZEBU is a lalr(1) parser generator package. It facilitates the
- automatic construction of syntax driven processors within the
- SCHEME [2] programming environment. Zebu bears resemblance to
- the UNIX program YACC [1].
-
- Zebu reads grammar files that contain the description of a
- language as well as the actions to be taken as the language is
- parsed. The language is described by production rules, while the
- actions are described by associated lambda expressions.
-
- Zebu generates parse table files which describe an lr parsing
- machine that processes the specified language.
-
- Zebu has a parser driver component which processes the specified
- language using the parse table file.
-
- A good discussion of lalr(1) parser generators and syntax
- directed translation appears in PRINCIPLES OF COMPILER DESIGN
- [3].
-
-
-
-
- I. Sample Session
-
- Zebu is best introduced by way of example. The following is a
- transcript of an interaction with Zebu.
-
- ;;; Zebu is used to generate the parse table file "ex1.tab"
- ;;; from grammar file "ex1.grm"
-
- 1 ]=> (compile-slr-grammar "ex1.grm" "ex1.tab")
- reading grammar from ex1.grm, start symbols is: E
- 7 productions, 11 symbols
- Dumping parse tables to ex1.tab
- ;No value
-
- ;;; The resulting parse table file is loaded into Zebu's
- ;;; parser driver:
- 1 ]=> (load-parse-tables "ex1.tab")
- ;No value
-
- ;;; Zebu's parser driver is used to parse an expression
- ;;; from the language described in grammar file "ex1.grm"
-
- 1 ]=> (list-parser '(ned "+" jed))
-
-
-
-
- -- 1 --
-
-
-
-
-
-
- ;Value: (EXPRESSION (EXPRESSION (TERM (FACTOR NED)))
- "+" (TERM (FACTOR JED)))
-
- 1 ]=> (list-parser '(fred "+" ned "*" jed))
- ;Value: (EXPRESSION (EXPRESSION (TERM (FACTOR FRED)))
- "+" (TERM (TERM (FACTOR NED)) "*" (FACTOR JED)))
-
-
-
-
- II. About Grammar Files
-
- This is the grammar file for the example above.
-
-
- (E E "+" T)
- (LAMBDA (EE PLUS TT) (LIST 'expression EE PLUS TT))
-
- (E T)
- (LAMBDA (TT) (LIST 'expression TT))
-
- (T T "*" F)
- (LAMBDA (TT TIMES F) (LIST 'term TT TIMES F))
-
- (T F)
- (LAMBDA (F) (LIST 'term F))
-
- (F "(" E ")")
- (LAMBDA (LP EE RP) (LIST 'factor LP EE RP))
-
- (F IDENTIFIER)
- (LAMBDA (ID) (list 'factor id))
-
-
-
- The file contains alternating encodings of grammar productions
- and associated actions to be performed at parsing time.
-
- The productions are encoded in scheme lists. The first element
- of the list is the left hand side of the production; the rest of
- the list is the right hand side of the production. Thus a
- production which normally might be written as:
-
-
- E -> E + T
-
- is encoded as:
-
- (E E "+" T) .
-
- Grammar symbols may be written either as scheme symbols or scheme
- strings. For Zebu's parsers, tokens are classified according to
- their relation to the grammar symbols in the sense of (equal?) .
-
- Zebu assumes that the start symbol of the grammar appears on the
-
-
-
-
- -- 2 --
-
-
-
-
-
-
- left hand side of the first production.
-
- The associated lambdas are applied during parsing. As the lr
- parser processes the input text it performs "reductions" using
- the productions of the grammar. When a reduction is performed,
- the action (lambda) associated with the production is applied.
-
- The arguments to the lambdas arise in two ways. The arguments to
- the lambda are in correspondence with the right hand side of the
- production. If an argument corresponds to a terminal grammar
- symbol, then the parser driver supplies the instantiation of the
- grammar symbol as seen by the tokenizer. If an argument
- corresponds to a non-terminal symbol, then the driver supplies
- the result of the application of the lambda that is associated
- with the production from which the non-terminal was reduced.
- This is not as complicated as it sounds. The value returned by
- the parser is the value given by the application of the lambda
- associated with the production which is used to do the final
- reduction of the input text to the start symbol.
-
- The lambdas in the above example are arranged to construct a
- parse tree corresponding to the parse of the input text.
-
- The lambdas may use side effects, if desired. The sequencing of
- the application of the lambdas is governed by the ordering of the
- reduce operations of the lr parser driver.
-
- The lambdas are used only at parsing time, not table generation
- time. If the Zebu parser driver isn't going to be used, tables
- may be built without supplying meaningful lambdas. As far as
- parse table construction is concerned, the lambdas may be
- replaced with arbitrary scheme expressions as place holders (such
- as #f).
-
- The format of Zebu's parse table files is described in the source
- file dump.scm.
-
-
-
-
-
- III. LALR(1) and SLR(1)
-
- Zebu has two parser generating algorithms. (COMPILE-SLR-GRAMMAR
- ...) requires an slr(1) grammar, while (COMPILE-LALR1-GRAMMAR
- ...) requires a lalr(1) grammar.
-
- Although all slr(1) languages are lalr(1), the reverse is not
- true. (compile-slr-grammar ...) is faster than (compile-lalr1-
- grammar ...) and so may be preferred if the language at hand is
- slr(1).
-
- The example grammar file "ex6-2.grm" is an example of a grammar
- which is lalr(1) but not slr(1).
-
-
-
-
-
- -- 3 --
-
-
-
-
-
-
-
-
-
-
- IV. Using Ambiguous Grammars
-
- A simple feature allows the Zebu system to be used with some
- ambiguous grammars.
-
- If the variable ALLOW-CONFLICTS is true, then when conflicts
- arise at table construction time, the system simply prefers the
- actions already in the tables. This feature works for the
- "dangling else" problem.
-
- Here is an example of building parse tables for a "dangling else"
- grammar:
-
- 1 ]=> (set! allow-conflicts #t)
- ;Value: ()
- 1 ]=> (compile-slr-grammar "dangelse.grm" "dangelse.tab")
- reading grammar from dangelse.grm, start symbols is: S
- 4 productions, 7 symbols
-
- ACTION CONFLICT!!! -- state: 6 old entry: (5 s 2)
- new entry: (5 r 2)
- WARNING: continuing to build tables despite conflicts...
- Will prefer old entry: (5 s 2)
- Dumping parse tables to dangelse.tab
- ;No value
-
- 1 ]=> (load-parse-tables "dangelse.tab")
-
- ;Value: (("+" . 4) ("*" . 6) ("(" . 8) (")" . 9) (IDENTIFIER . 10))
-
- 1 ]=> (list-parser '(i i a e a))
-
- ;;; Note that the tables "do the right thing" with the dangling
- ;;; else construct. (The else is associated with the nearest
- ;;; un-elsed if).
-
- ;Value: (I (I (A) E (A)))
-
- The reported action conflict is a "shift-reduce" conflict that
- has been resolved in favor of shifting.
-
- When conflicts arise, it may be useful to use (PRINT-COLLECTION
- #F) or (PRINT-COLLECTION #T) which dump the lr0 item sets (which
- correspond to the states of the parser) in order to understand
- the problem the parser is facing. (See the source file
- tables.l for details.)
-
- Obtaining useful parse tables from ambiguous grammars requires
- care. PRINCIPLES OF COMPILER DESIGN [2] has a good discussion
- about using ambiguous grammars.
-
-
-
-
-
- -- 4 --
-
-
-
-
-
-
-
-
-
- V. The Parser Driver and Tokenizers
-
- Zebu has two components: the parser generator (the bulk of the
- code) and a lr parser driver (contained in the file driver.scm).
- The parser generator is needed only when parse table files are
- being generated.
-
- The parser driver is quite simple. It contains a lr parsing
- engine (LR-PARSE) which may be used with arbitrary user supplied
- tokenizers. (LR-PARSE) is configured using (LOAD-PARSE-TABLES).
-
- Two parsers are supplied which use (LR-PARSE). (LIST-PARSER)
- will parse a token stream appearing as a scheme list. (READ-
- PARSER) will parse a token stream gotten via the scheme function
- (READ).
-
- For some applications, the scheme function (READ) may not be a
- suitable tokenizer. In these cases, a custom tokenizer should be
- constructed for use with (LR-PARSE).
-
- The two parsers (LIST-PARSER) and (READ-PARSER) have special case
- behavior on the terminal symbols NUMBER and IDENTIFIER. If the
- terminal symbol NUMBER is present in the grammar, then the token
- classifier will categorize objects that satisfy the scheme
- function (NUMBER?) as instances of the terminal symbol NUMBER.
- Other objects which match terminal symbols in the grammar will be
- categorized as such. Otherwise objects will be categorized as
- instances of the terminal symbol IDENTIFIER, if it is present in
- the grammar.
-
-
-
- APPENDIX 1. Loading Zebu
-
- To load the Zebu system, first load the file Zebu.scm from the
- directory where Zebu is stored:
-
- (load "... /zebu.scm")
-
-
- then use the function
-
- (load-zebu)
-
- to load Zebu.
-
-
-
- APPENDIX 2. Function Summary
-
- ;;; Load the zebu bootstrap file:
- (load "... /zebu.scm")
-
-
-
-
- -- 5 --
-
-
-
-
-
-
-
- ;;; Load the Zebu system.
- (load-zebu)
-
- ;;; Compile an slr(1) grammar:
- (compile-slr-grammar <grammar file> <parse table file>)
-
- ;;; Compile an lalr(1) grammar:
- (compile-lalr1-grammar <grammar file> <parse table file>)
-
- ;;; Load a parse table file into Zebu's parsing engine:
- (load-parse-tables <parse-table-file>)
-
- ;;; Parse some tokens from a list:
- (list-parser '< a list of tokens ... >)
-
- ;;; Parse some tokens from the scheme function (read)
- ;;; (this will detect end of tokens at end of file or upon reading
- ;;; the scheme symbol :eof
- (read-parser)
-
-
-
- APPENDIX 3. Interactive Calculator Example
-
- Here is a sample session that demonstrates an interactive
- calculator implemented with Zebu.
-
- ;;; Load up the parse table file describing the calculator.
- 1 ]=> (load-parse-tables "calc.tab")
- ;Value: ((I . 4) (E . 5) (A . 6))
-
- ;;; Load the file containing the rest of the calculator.
- 1 ]=> (load "calc.scm")
- ;Value: CALC
-
- ;;; Do some calc-ulating...
- 1 ]=> (calc)
- ? 1 + 2 * 3 :
- 7
- ? 5 -> jed :
- 5
- ? jed :
- 5
- ? 1 + 3 -> fred -> ned :
- 4
- ? ned * jed :
- 20
- ? :eof
- Bye now.
- ;No value
-
-
- Here is the content of calc.scm:
-
-
-
-
-
- -- 6 --
-
-
-
-
-
-
-
-
- (define bindings '())
-
- (define (calc)
- (display "? ")
- (read-parser))
-
-
-
- And here is the grammar file describing the calculator:
-
- ;;; This grammar and the associated lambdas define
- ;;; a simple interactive calculator.
-
-
- ;;; A session is a sequence of transactions with the user.
- ;;; When we reduce by this production, we're all done.
- (session transaction-sequence)
- (lambda (ignore) (display "Bye now."))
-
- ;;; A transaction-sequence is a sequence of transactions separated
- ;;; by colons
- (transaction-sequence transaction-sequence transaction :)
- (lambda (a b c) 'dont-care)
-
- ;;; There might be no transactions.
- (transaction-sequence)
- (lambda () 'not-used)
-
- ;;; When the parser carries out the reduction by this
- ;;; production, it causes the value of the expression
- ;;; to be printed, and a new prompt to be shown.
-
- (transaction expression)
- (lambda (value) (display value) (newline) (display "? "))
-
-
-
- ;;; The assignment operator. Store the built up value of
- ;;; the expression under the identifier. The id argument
- ;;; to the lambda is supplied with the symbol seen in the
- ;;; token stream. The lambda arranges to store away the
- ;;; expression value under the symbol (in an alist).
- (expression expression -> identifier)
- (lambda (exp arrow id)
- (set! bindings (cons `(,id . ,exp) bindings))
- exp)
-
-
- ;;; Implement addition: When reduction occurs by this
- ;;; production, we perform addition on the previously
- ;;; built up sub expression values.
- (expression expression + term)
- (lambda (exp plus trm) (+ exp trm))
-
-
-
-
- -- 7 --
-
-
-
-
-
-
-
- ;;; Similar to addition.
- (expression expression - term)
- (lambda (exp minus trm) (- exp trm))
-
- ;;; Arrange for typical distributive properties...
- (expression term)
- (lambda (x) x)
-
- ;;; Implement multiplication. Like addition.
- (term term * factor)
- (lambda (trm mult fac) (* trm fac))
-
- ;;; Division.
- (term term / factor)
- (lambda (trm div fac) (/ trm fac))
-
- (term factor)
- (lambda (x) x)
-
-
- ;;; Do paren style grouping using < and > .
- (factor < expression > )
- (lambda (open-bracket exp close-bracket)
- exp)
-
- ;;; The terminal symbol number. When the parser driver
- ;;; applies this lambda it supplies the value of the number
- ;;; as seen in the token stream.
- (factor number)
- (lambda (x) x)
-
- ;;; The terminal symbol "identifier". When the parser driver
- ;;; applies this lambda, it supplies the symbol as seen in the
- ;;; token stream. The value associated with this name is looked
- ;;; up by this lambda.
- (factor identifier)
- (lambda (id)
- (cdr (assq id bindings)))
-
-
-
- REFERENCES
-
- [1] S.C. Johnson. YACC -- Yet Another Compiler Compiler.
- Computing Science Technical Report 32, AT&T Bell Laboratories,
- Murray Hill, NJ, 1975
-
- [2] J. Rees and W. Clinger (eds). Revised^3 Report on the
- Algorithmic Language Scheme. MIT AI Laboratory AI Memo 848a,
- 1986
-
- [3] A.V. Aho and J.D. Ullman. PRINCIPLES OF COMPILER DESIGN,
- Addison Wesley 1979
-
-
-
-
-
- -- 8 --
-
-